---
title: Upgrading PXF When you Upgrade Greenplum 5
---

If you are using PXF in your current Greenplum Database 5.x installation, you must perform some PXF upgrade actions when you upgrade to a new version of Greenplum Database 5.x.

The PXF upgrade procedure describes how to upgrade PXF in your Greenplum Database installation. This procedure uses *PXF.from* to refer to your currently-installed PXF version and *PXF.to* to refer to the PXF version installed when you upgrade to the new version of Greenplum Database.

Most PXF installations do not require modifications to PXF configuration files and should experience a seamless upgrade.

**Note**: Starting in Greenplum Database version 5.12, PXF no longer requires a Hadoop client installation. PXF now bundles all of the JAR files on which it depends, and loads these JARs at runtime.

The PXF upgrade procedure has two parts. You perform one procedure before, and one procedure after, you upgrade to a new version of Greenplum Database:

-   [Step 1: PXF Pre-Upgrade Actions](#pxfpre)
-   Upgrade to a new Greenplum Database version
-   [Step 2: Upgrading PXF](#pxfup)


## <a id="pxfpre"></a>Step 1: PXF Pre-Upgrade Actions

Perform this procedure before you upgrade to a new version of Greenplum Database:

1. Log in to the Greenplum Database master node. For example:

    ``` shell
    $ ssh gpadmin@<gpmaster>
    ```

2. Identify and note the *PXF.from* version number. For example:

    ``` shell
    gpadmin@gpmaster$ pxf version
    ```

3. Stop PXF on each segment host as described in [Stopping PXF](../../pxf/5-15/cfginitstart_pxf.html#stop_pxf).

4. **If you are upgrading from Greenplum Database version 5.14 or earlier**:

    1. Back up the *PXF.from* configuration files found in the `$GPHOME/pxf/conf/` directory. These files should be the same on all segment hosts, so you need only copy from one of the hosts. For example:

        ``` shell
        gpadmin@gpmaster$ mkdir -p /save/pxf-from-conf
        gpadmin@gpmaster$ scp gpadmin@seghost1:/usr/local/greenplum-db/pxf/conf/* /save/pxf-from-conf/
        ```

    2. Note the locations of any custom JAR files that you may have added to your *PXF.from* installation. Save a copy of these JAR files.

6. Upgrade to the new version of Greenplum Database and then continue your PXF upgrade with [Step 2: Upgrading PXF](#pxfup).


## <a id="pxfup"></a>Step 2: Upgrading PXF

After you upgrade to the new version of Greenplum Database, perform the following procedure to upgrade and configure the *PXF.to* software:

1. Log in to the Greenplum Database master node. For example:

    ``` shell
    $ ssh gpadmin@<gpmaster>
    ```

2. If you installed the PXF `rpm` on your Greenplum 6 hosts:

    1. Copy the PXF extension files from the PXF installation directory to the new Greenplum 6 install directory:

        ``` shell
        gpadmin@gpmaster pxf cluster register
        ```

    2. Start PXF on each segment host as described in [Starting PXF](../../pxf/5-15/cfginitstart_pxf.html#start_pxf).

    3. Exit this procedure.

2. Initialize PXF on each segment host as described in [Initializing PXF](../../pxf/5-15/init_pxf.html).

3. PXF user impersonation is on by default in Greenplum Database version 5.5.0 and later. If you are upgrading from an older *PXF.from* version, you must configure user impersonation for the underlying Hadoop services. Refer to [Configuring User Impersonation and Proxying](../../pxf/5-15/pxfuserimpers.html) for instructions, including the configuration procedure to turn off PXF user impersonation.

4. **If you are upgrading from Greenplum Database version 5.14 or earlier**:

    1. If you updated the `pxf-env.sh` configuration file in your *PXF.from* installation, re-apply those changes to `$PXF_CONF/conf/pxf-env.sh`. For example:

        ``` shell
        gpadmin@gpmaster$ vi $PXF_CONF/conf/pxf-env.sh
           <update the file>
        ```
    2. Similarly, if you updated the `pxf-profiles.xml` configuration file in your *PXF.from* installation, re-apply those changes to `$PXF_CONF/conf/pxf-profiles.xml` on the master host.

        **Note:** Starting in Greenplum Database version 5.12, the package name for PXF classes was changed to use the prefix `org.greenplum.*`. If you are upgrading from an older *PXF.from* version and you customized the `pxf-profiles.xml` file, you must change any `org.apache.hawq.pxf.*` references to `org.greenplum.pxf.*` when you re-apply your changes.
    3. If you updated the `pxf-log4j.properties` configuration file in your *PXF.from* installation, re-apply those changes to `$PXF_CONF/conf/pxf-log4j.properties` on the master host.
    4. If you updated the `pxf-public.classpath` configuration file in your *PXF.from* installation, copy every JAR referenced in the file to `$PXF_CONF/lib` on the master host.
    5. If you added additional JAR files to your *PXF.from* installation, copy them to `$PXF_CONF/lib` on the master host.
    5. Starting in Greenplum Database version 5.15, PXF requires that the Hadoop configuration files reside in the `$PXF_CONF/servers/default` directory. If you configured PXF Hadoop connectors in your *PXF.from* installation, copy the Hadoop configuration files in `/etc/<hadoop_service>/conf` to `$PXF_CONF/servers/default` on the Greenplum Database master host.
    5. Starting in Greenplum Database version 5.15, the default Kerberos keytab file location for PXF is `$PXF_CONF/keytabs`. If you previously configured PXF for secure HDFS and the PXF keytab file is located in a *PXF.from* installation directory (for example, `$GPHOME/pxf/conf`), consider relocating the keytab file to `$PXF_CONF/keytabs`. Alternatively, update the `PXF_KEYTAB` property setting in the `$PXF_CONF/conf/pxf-env.sh` file to reference your keytab file.

5. **If you are upgrading from Greenplum Database version 5.18 or earlier**:

    1. PXF now bundles Hadoop version 2.9.2 dependent JAR files. If you registered additional Hadoop-related JAR files in `$PXF_CONF/lib`, ensure that these libraries are compatible with Hadoop version 2.9.2.

    2. The PXF JDBC connector now supports file-based server configuration. If you choose to utilize this new feature with existing external tables that reference an external SQL database, refer to the configuration instructions in [Configuring the JDBC Connector](../../pxf/5-15/jdbc_cfg.html) for additional information.

    2. The PXF JDBC connector now supports statement query timeouts. This feature requires explicit support from the JDBC driver. The default query timeout is `0`, wait forever. Some JDBC drivers support a `0` timeout value without fully supporting the statement query timeout feature. Ensure that any JDBC driver that you have registered supports the default timeout, or better yet, fully supports this feature. You may need to update your JDBC driver version to obtain this support. Refer to the configuration instructions in [JDBC Driver JAR Registration](../../pxf/5-15/jdbc_cfg.html#cfg_jar) for information about registering JAR files with PXF.

    3. If you plan to use Hive partition filtering with integral types, you must set the `hive.metastore.integral.jdo.pushdown` Hive property in your Hadoop cluster's `hive-site.xml` and in your PXF user configuration `$PXF_CONF/servers/default/hive-site.xml` file. See [Updating Hadoop Configuration](../../pxf/5-15/client_instcfg.html#client-cfg-update).

5. **If you are upgrading from Greenplum Database version 5.21.1 or earlier**: The PXF Hive connector no longer supports providing a `DELIMITER=<delim>` option in the `LOCATION` URI when you create an external table that specifies the `HiveText` or `HiveRC` profiles. If you have previously created an external table that specified a `DELIMITER` in the `LOCATION` URI, you must drop the table, and recreate it omitting the `DELIMITER` from the `LOCATION`. You are still required to provide a non-default delimiter in the external table formatting options.

5. **If you are upgrading from Greenplum Database version 5.23 or earlier** and you have configured any JDBC servers that access Kerberos-secured Hive, you must now set the `hadoop.security.authentication` property in the `jdbc-site.xml` file to explicitly identify use of the Kerberos authentication method. Perform the following for each of these server configs:

    1. Navigate to the server configuration directory.
    2. Open the `jdbc-site.xml` file in the editor of your choice and uncomment or add the following property block to the file:

        ```xml
        <property>
            <name>hadoop.security.authentication</name>
            <value>kerberos</value>
        </property>
        ```
    3. Save the file and exit the editor.


5. **If you are upgrading from Greenplum Database version 5.26 or earlier**: The PXF `Hive` and `HiveRC` profiles now support column projection using column name-based mapping. If you have any existing PXF external tables that specify one of these profiles, and the external table relied on column index-based mapping, you may be required to drop and recreate the tables:

    1. Identify all PXF external tables that you created that specify a `Hive` or `HiveRC` profile.

    2. For *each* external table that you identify in step 1, examine the definitions of both the PXF external table and the referenced Hive table. If the column names of the PXF external table *do not* match the column names of the Hive table:

        1. Drop the existing PXF external table. For example:

            ``` sql
            DROP EXTERNAL TABLE pxf_hive_table1;
            ```

        2. Recreate the PXF external table using the Hive column names. For example:

            ``` sql
            CREATE EXTERNAL TABLE pxf_hive_table1( hivecolname int, hivecolname2 text )
              LOCATION( 'pxf://default.hive_table_name?PROFILE=Hive')
            FORMAT 'custom' (FORMATTER='pxfwritable_import');
            ```

        3. Review any SQL scripts that you may have created that reference the PXF external table, and update column names if required.


5. Synchronize the PXF configuration from the master host to the standby master and each Greenplum Database segment host. For example:

    ``` shell
    gpadmin@gpmaster$ $GPHOME/pxf/bin/pxf cluster sync
    ```
 
6. Start PXF on each segment host as described in [Starting PXF](../../pxf/5-15/cfginitstart_pxf.html#start_pxf).

